{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5ab60f84-39b3-4bdd-ae83-6527acb315e5",
   "metadata": {},
   "source": [
    "# LLMs and LlamaIndex ◦ April 2 2024 ◦ Ontario Teacher's Pension Plan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2276edc2-74ce-4d85-bdaa-0607c74fc981",
   "metadata": {},
   "outputs": [],
   "source": [
    "##################################################################\n",
    "# Venue: OTPP L&L\n",
    "# Talk: LLMs and LlamaIndex\n",
    "# Speaker: Andrei Fajardo\n",
    "##################################################################"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bee89b8-a04d-4326-9392-b9e7e1bcb8af",
   "metadata": {},
   "source": [
    "![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/title.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4d38b38-ea48-4012-81ae-84e1d1f40a69",
   "metadata": {},
   "source": [
    "![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/framework.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34d1f8e7-f978-4f19-bdfb-37c0d235b5bf",
   "metadata": {},
   "source": [
    "#### Notebook Setup & Dependency Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f227a52a-a147-4e8f-b7d3-e03f983fd5f1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.11 -m pip install --upgrade pip\u001b[0m\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install llama-index-vector-stores-qdrant -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bc383fc-19b2-47b5-af61-e83210ea9c37",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d56aba9c-230b-464d-909c-de17d1bc0aa6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: data: File exists\n",
      "--2024-04-03 11:23:36--  https://www.otpp.com/content/dam/otpp/documents/reports/2023-ar/otpp-2023-annual-report-eng.pdf\n",
      "Resolving www.otpp.com (www.otpp.com)... 67.210.219.20\n",
      "Connecting to www.otpp.com (www.otpp.com)|67.210.219.20|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 10901939 (10M) [application/pdf]\n",
      "Saving to: ‘./data/otpp-2023-annual-report-eng.pdf’\n",
      "\n",
      "./data/otpp-2023-an 100%[===================>]  10.40M  1.01MB/s    in 10s     \n",
      "\n",
      "2024-04-03 11:23:49 (1.02 MB/s) - ‘./data/otpp-2023-annual-report-eng.pdf’ saved [10901939/10901939]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir data\n",
    "!wget \"https://www.otpp.com/content/dam/otpp/documents/reports/2023-ar/otpp-2023-annual-report-eng.pdf\" -O \"./data/otpp-2023-annual-report-eng.pdf\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "275c00f1-e358-498a-88c3-8e810a5a2546",
   "metadata": {},
   "source": [
    "## Motivation\n",
    "\n",
    "![Motivation Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/motivation.excalidraw.svg)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25d4ce76-8eea-44cb-aa99-94844dfed9c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# query an LLM and ask it about Mistplay\n",
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-4-turbo-preview\")\n",
    "response = llm.complete(\"What is Ontario Teacher's Pension Plan all about?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3f18489-4f25-40ce-86e9-697ddea7d6c6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Ontario Teachers' Pension Plan (OTPP) is one of the world's largest pension funds, serving the public school teachers of Ontario, Canada. Established in 1990, it operates as an independent organization responsible for administering defined-benefit pensions for school teachers of the province. The OTPP is jointly sponsored by the Government of Ontario and the Ontario Teachers' Federation, meaning both the government and the teachers' union have a say in the management and direction of the fund.\n",
      "\n",
      "The primary purpose of the OTPP is to provide a stable and reliable source of retirement income for its members. It does so by collecting contributions from both teachers and their employers (the government) and investing those funds in a wide variety of assets, including stocks, bonds, real estate, and infrastructure projects, both domestically and internationally. The goal is to generate sufficient returns to ensure the long-term sustainability of the pension plan while managing risk.\n",
      "\n",
      "The OTPP is notable for its size, investment success, and innovative approach to pension management. It has been a pioneer in direct investment, taking significant stakes in companies, real estate, and infrastructure projects around the world. This direct investment strategy has allowed the OTPP to reduce reliance on external fund managers, thereby lowering costs and potentially increasing returns.\n",
      "\n",
      "The plan's governance model is designed to ensure stability and sustainability. The board of directors, which oversees the operation of the OTPP, includes representatives from both the government and the Ontario Teachers' Federation. This joint sponsorship model ensures that both the interests of the teachers and the fiscal responsibilities of the government are taken into account in the plan's management.\n",
      "\n",
      "In summary, the Ontario Teachers' Pension Plan is a critical component of the retirement security system for Ontario's public school teachers, distinguished by its large scale, sophisticated investment strategy, and joint governance structure. It aims to deliver secure and sustainable pensions to its members, contributing significantly to their financial well-being in retirement.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80b2f9c3-1e07-4178-9e7e-220355725955",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llm.complete(\n",
    "    \"According to the 2023 annual report, how many billions of dollars in net assets does Ontario Teacher's Pension Plan hold?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a1c6424-cf2f-4ee1-b601-674b03354e35",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "As of my last update in April 2023, the Ontario Teachers' Pension Plan (OTPP) reported holding net assets of CAD 247.5 billion in its 2022 annual report. Please note that this information might have changed after my last update, and I would recommend checking the latest annual report or official OTPP sources for the most current figures.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "632e1dec-045b-42fb-b325-c25e1ec4bb75",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llm.complete(\n",
    "    \"According to the 2023 annual report, what is the 10-year total-fund net return?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53efac56-7966-407f-a298-bc2e0e14518e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "As of my last update in April 2023, I don't have access to real-time data or specific annual reports from 2023. Therefore, I cannot provide the 10-year total-fund net return from any specific 2023 annual report. This information would typically be found directly in the report of the specific fund or investment you're interested in. I recommend checking the official website or contacting the financial institution directly for the most accurate and up-to-date information.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04a0ef8d-d55c-4b64-887b-18d343503a76",
   "metadata": {},
   "source": [
    "## Basic RAG in 3 Steps\n",
    "\n",
    "![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/subheading.excalidraw.svg)\n",
    "\n",
    "\n",
    "1. Build external knowledge (i.e., uploading updated data sources)\n",
    "2. Retrieve\n",
    "3. Augment and Generate"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "598a5257-20ae-468e-85d6-d4e8c46b8cb5",
   "metadata": {},
   "source": [
    "## 1. Build External Knowledge\n",
    "\n",
    "![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step1.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2963f90-9da5-4a0d-8dbe-f16fcb8627a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Load the data.\n",
    "\n",
    "With llama-index, before any transformations are applied,\n",
    "data is loaded in the `Document` abstraction, which is\n",
    "a container that holds the text of the document.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "loader = SimpleDirectoryReader(input_dir=\"./data\")\n",
    "documents = loader.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da321e2c-8428-4c04-abf2-b204416e816f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Investing to  \\nmake a mark\\n2023 Annual Report'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you want to see what the text looks like\n",
    "documents[0].text[:1000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4801e74a-8c52-45c4-967d-7a1a94f54ad3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Chunk, Encode, and Store into a Vector Store.\n",
    "\n",
    "To streamline the process, we can make use of the IngestionPipeline\n",
    "class that will apply your specified transformations to the\n",
    "Document's.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core.ingestion import IngestionPipeline\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.vector_stores.qdrant import QdrantVectorStore\n",
    "import qdrant_client\n",
    "\n",
    "client = qdrant_client.QdrantClient(location=\":memory:\")\n",
    "vector_store = QdrantVectorStore(client=client, collection_name=\"test_store\")\n",
    "\n",
    "pipeline = IngestionPipeline(\n",
    "    transformations=[\n",
    "        SentenceSplitter(),\n",
    "        OpenAIEmbedding(),\n",
    "    ],\n",
    "    vector_store=vector_store,\n",
    ")\n",
    "_nodes = pipeline.run(documents=documents, num_workers=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02afea25-098b-49c7-a965-21c7576757af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Investing to  \\nmake a mark\\n2023 Annual Report'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you want to see the nodes\n",
    "# len(_nodes)\n",
    "_nodes[0].text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44cd8a86-089d-4329-9484-35b98b3a26f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Create a llama-index... wait for it... Index.\n",
    "\n",
    "After uploading your encoded documents into your vector\n",
    "store of choice, you can connect to it with a VectorStoreIndex\n",
    "which then gives you access to all of the llama-index functionality.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "286b1827-7547-49c6-aba3-82f08d6d86b8",
   "metadata": {},
   "source": [
    "## 2. Retrieve Against A Query\n",
    "\n",
    "![Step2 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step2.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49f86af1-db08-4641-89ad-d60abd04e6b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Retrieve relevant documents against a query.\n",
    "\n",
    "With our Index ready, we can now query it to\n",
    "retrieve the most relevant document chunks.\n",
    "\"\"\"\n",
    "\n",
    "retriever = index.as_retriever(similarity_top_k=2)\n",
    "retrieved_nodes = retriever.retrieve(\n",
    "    \"According to the 2023 annual report, what is the 10-year total-fund net return?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05f9ce3b-a4e3-4862-b58c-2d9fba1f9abc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stable long-term total-fund returns \n",
      "1 Net assets include investment assets less investment liabilities (net investments), plus the receivables from the Province of Ontario, and \n",
      "other assets less other liabilities.\n",
      "2 A real rate of return is the net return, or annual percentage of profit earned on an investment, adjusted for inflation.Ontario Teachers’ investment program is tailored to \n",
      "generate strong and steady risk-adjusted returns to \n",
      "pay members’ pensions over generations, while also \n",
      "havi\n"
     ]
    }
   ],
   "source": [
    "# to view the retrieved node\n",
    "print(retrieved_nodes[0].text[:500])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "978ae2c5-8c2a-41c7-a2eb-85a5562f2db5",
   "metadata": {},
   "source": [
    "## 3. Generate Final Response\n",
    "\n",
    "![Step3 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step3.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef33c349-eed4-4e35-9b5d-9473adf2ce01",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Context-Augemented Generation.\n",
    "\n",
    "With our Index ready, we can create a QueryEngine\n",
    "that handles the retrieval and context augmentation\n",
    "in order to get the final response.\n",
    "\"\"\"\n",
    "\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4139c48a-ece8-4244-b4eb-7cff74cb1325",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Context information is below.\n",
      "---------------------\n",
      "{context_str}\n",
      "---------------------\n",
      "Given the context information and not prior knowledge, answer the query.\n",
      "Query: {query_str}\n",
      "Answer: \n"
     ]
    }
   ],
   "source": [
    "# to inspect the default prompt being used\n",
    "print(\n",
    "    query_engine.get_prompts()[\n",
    "        \"response_synthesizer:text_qa_template\"\n",
    "    ].default_template.template\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6179639d-af96-4a09-b440-b47ad599a26f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The 10-year total-fund net return, as stated in the 2023 annual report, is 7.6%.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"According to the 2023 annual report, what is the 10-year total-fund net return?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a050bb7f-266c-46c7-8773-eccb909e5fd6",
   "metadata": {},
   "source": [
    "## Beyond Basic RAG: Improved PDF Parsing with LlamaParse\n",
    "\n",
    "To use LlamaParse, you first need to obtain an API Key. Visit [llamacloud.ai](https://cloud.llamaindex.ai/login) to login (or sign-up) and get an api key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "281b527d-a305-4432-b34f-811860ed6166",
   "metadata": {},
   "outputs": [],
   "source": [
    "api_key = \"<FILL-IN>\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4a08bc5-1ff9-42f5-91a3-f77f3a529ff1",
   "metadata": {},
   "source": [
    "### The default pdf reader (PyPDF), like many out-of-the box pdf parsers struggle on complex PDF docs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32b6d6bd-f04a-400a-99ac-b9c0308c7554",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Steve McGirr, Chair of the Board, attended all board meetings, which totaled 11 full meetings, including one strategic offsite meeting.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"How many board meetings did Steve McGirr, Chair of the Board, attend?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db781501-1e4f-4c2f-8ba5-aef3ab9c693e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "55% of board members identify as women.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"What percentage of board members identify as women?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40e70776-611d-4b52-bd89-3f233cdaa9c5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The total investment percentage in Canada as of December 31, 2023, is 29% (10% in public equity and 19% in inflation-sensitive investments).\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"What is the total investment percentage in Canada as of December 31, 2023?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3895970c-dae2-4ae3-b48d-72da7617fcd3",
   "metadata": {},
   "source": [
    "### Improved PDF Parsing using LlamaParse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27556916-9e45-4bfe-b1d9-ea12ed4bab41",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_parse import LlamaParse\n",
    "from llama_index.core.node_parser import MarkdownElementNodeParser\n",
    "from llama_index.llms.openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad81e2cf-90b6-49af-b36e-2f896abec0c8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id ddcdc5f9-bd16-40b8-90f2-353f2a2b6450\n"
     ]
    }
   ],
   "source": [
    "from llama_parse import LlamaParse\n",
    "\n",
    "parser = LlamaParse(result_type=\"markdown\")\n",
    "md_documents = parser.load_data(\n",
    "    file_path=\"./data/otpp-2023-annual-report-eng.pdf\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c49c7404-1ff9-4cdc-bcc1-5a43378f21ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save to an .md file\n",
    "with open(\"./mds/parsed.md\", \"w\") as f:\n",
    "    f.write(md_documents[0].text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c84a614-b916-463b-bf58-ba2a4bced4b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "133it [00:00, 70142.39it/s]\n",
      "100%|███████████████████████████████████████████████████████████████████████████████████| 133/133 [01:13<00:00,  1.80it/s]\n"
     ]
    }
   ],
   "source": [
    "md_node_parser = MarkdownElementNodeParser(\n",
    "    llm=OpenAI(model=\"gpt-4.5-turbo-preview\"),\n",
    "    num_workers=3,\n",
    "    include_metadata=True,\n",
    ")\n",
    "md_nodes = md_node_parser.get_nodes_from_documents(md_documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efaed5ae-9c87-4670-b571-9dff91b06cea",
   "metadata": {},
   "outputs": [],
   "source": [
    "llama_parse_index = VectorStoreIndex.from_documents(md_documents)\n",
    "llama_parse_query_engine = llama_parse_index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e54ff4a-22d8-45b4-b27c-84387ff7b4bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Steve McGirr, Chair of the Board, attended 11 board meetings.\n"
     ]
    }
   ],
   "source": [
    "response = llama_parse_query_engine.query(\n",
    "    \"How many board meetings did Steve McGirr, Chair of the Board, attend?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdf74b49-a64e-4de2-a761-e117123ff069",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "55% of board members identify as women.\n"
     ]
    }
   ],
   "source": [
    "response = llama_parse_query_engine.query(\n",
    "    \"What percentage of board members identify as women?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "328ee422-cd6d-4d5d-bfe9-da5e878f85a5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The total investment percentage in Canada as of December 31, 2023, is 35%.\n"
     ]
    }
   ],
   "source": [
    "response = llama_parse_query_engine.query(\n",
    "    \"What is the total investment percentage in Canada as of December 31, 2023?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dae63946-be38-4807-af2a-8113661a806b",
   "metadata": {},
   "source": [
    "## In Summary\n",
    "\n",
    "- LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain specific, updated data, long-tail)\n",
    "- Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation\n",
    "- In this notebook, we showed one such example that follows that pattern."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53bfd859-924a-4748-b4d8-ee0f2dc52e3b",
   "metadata": {},
   "source": [
    "## Data Extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01fb77fd-f4e8-42b0-9704-d2229f4e66a8",
   "metadata": {},
   "source": [
    "![DataExtractions](https://media.licdn.com/dms/image/D4E22AQGwPmZ5RRhbyA/feedshare-shrink_1280/0/1711823067172?e=1715212800&v=beta&t=fJtksPZ3Fm-BOrKRCwa6BrYyuxlNFDuop3ZSopMl53M)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a63649cf-7f3a-4f6d-a383-92163f71b03c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from llama_index.core.bridge.pydantic import BaseModel, Field\n",
    "\n",
    "from llama_index.program.openai import OpenAIPydanticProgram\n",
    "from llama_index.llms.openai import OpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f306da0-4da6-45b5-b9b3-50d8d0b9883a",
   "metadata": {},
   "source": [
    "### Leadership Team"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b37bc7e7-4ee8-40e8-891f-0307997049a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "class LeadershipTeam(BaseModel):\n",
    "    \"\"\"Data model for leadership team.\"\"\"\n",
    "\n",
    "    ceo: str = Field(description=\"The CEO\")\n",
    "    coo: str = Field(description=\"The Chief Operating Officer\")\n",
    "    cio: str = Field(description=\"Chief Investment Officer\")\n",
    "    chief_pension_officer: str = Field(description=\"Chief Pension Officer\")\n",
    "    chief_legal_officer: str = Field(\n",
    "        description=\"Chief Legal & Corporate Affairs Officer\"\n",
    "    )\n",
    "    chief_people_officer: str = Field(description=\"Chief People Officer\")\n",
    "    chief_strategy_officer: str = Field(description=\"Chief Strategy Officer\")\n",
    "    executive_managing_director: str = Field(\n",
    "        description=\"Executive Managing Director\"\n",
    "    )\n",
    "    chief_investment_officer: str = Field(\n",
    "        description=\"Chief Investment Officer\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec28d600-5001-4176-bd9f-5982c05d02a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Here is the 2023 Annual Report for Ontario Teacher's Pension Plan:\n",
    "{document_text}\n",
    "Provide the names of the Leadership Team.\n",
    "\"\"\"\n",
    "\n",
    "program = OpenAIPydanticProgram.from_defaults(\n",
    "    output_cls=LeadershipTeam,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    llm=OpenAI(\"gpt-4-turbo-preview\"),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0de616a2-c96a-4be0-b8ba-04e537d03e61",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Function call: LeadershipTeam with args: {\"ceo\":\"Jo Taylor\",\"coo\":\"Tracy Abel\",\"cio\":\"Gillian Brown\",\"chief_pension_officer\":\"Charley Butler\",\"chief_legal_officer\":\"Sharon Chilcott\",\"chief_people_officer\":\"Jeff Davis\",\"chief_strategy_officer\":\"Jonathan Hausman\",\"executive_managing_director\":\"Nick Jansa\",\"chief_investment_officer\":\"Stephen McLennan\"}\n"
     ]
    }
   ],
   "source": [
    "leadership_team = program(document_text=md_documents[0].text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41f8b4b0-0f19-4101-9642-1b5bc555dfac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"ceo\": \"Jo Taylor\",\n",
      "    \"coo\": \"Tracy Abel\",\n",
      "    \"cio\": \"Gillian Brown\",\n",
      "    \"chief_pension_officer\": \"Charley Butler\",\n",
      "    \"chief_legal_officer\": \"Sharon Chilcott\",\n",
      "    \"chief_people_officer\": \"Jeff Davis\",\n",
      "    \"chief_strategy_officer\": \"Jonathan Hausman\",\n",
      "    \"executive_managing_director\": \"Nick Jansa\",\n",
      "    \"chief_investment_officer\": \"Stephen McLennan\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(json.dumps(leadership_team.dict(), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc857227-3fed-4bb6-a062-99ea3c55e294",
   "metadata": {},
   "source": [
    "# LlamaIndex Has More To Offer\n",
    "\n",
    "- Data infrastructure that enables production-grade, advanced RAG systems\n",
    "- Agentic solutions\n",
    "- Newly released: `llama-index-networks`\n",
    "- Enterprise offerings (alpha):\n",
    "    - LlamaParse (proprietary complex PDF parser) and\n",
    "    - LlamaCloud"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17c1c027-be8b-48f4-87ee-06f3e2c71797",
   "metadata": {},
   "source": [
    "### Useful links\n",
    "\n",
    "[website](https://www.llamaindex.ai/) ◦ [llamahub](https://llamahub.ai) ◦ [llamaparse](https://github.com/run-llama/llama_parse) ◦ [github](https://github.com/run-llama/llama_index) ◦ [medium](https://medium.com/@llama_index) ◦ [rag-bootcamp-poster](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/final_poster.excalidraw.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "otpp",
   "language": "python",
   "name": "otpp"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
